Selforganizing Classification on the Reuters News Corpus
نویسندگان
چکیده
In this paper we propose an integration of a selforganizing map and semantic networks from WordNet for a text classification task using the new Reuters news corpus. This neural model is based on significance vectors and benefits from the presentation of document clusters. The Hypernym relation in WordNet supplements the neural model in classification. We also analyse the relationships of news headlines and their contents of the new Reuters corpus by a series of experiments. This hybrid approach of neural selforganization and symbolic hypernym relationships is successful to achieve good classification rates on 100,000 full-text news articles. These results demonstrate that this approach can scale up to a large real-world task and show a lot of potential for text classification.
منابع مشابه
The Reuters Corpus Volume 1 -from Yesterday's News to Tomorrow's Language Resources
Reuters, the global information, news and technology group, has for the first time made available free of charge, large quantities of archived Reuters news stories for use by research communities around the world. The Reuters Corpus Volume 1 (RCV1) includes over 800,000 news stories typical of the annual English language news output of Reuters. This paper describes the origins of RCV1, the moti...
متن کاملClassification automatique de textes basée sur une ontologie normée. Application du Extensible Business Reporting Language (XBRL) au Reuters Corpus Volume 1 (RCV1)
We demonstrate that applying a domain-specific ontology standard significantly improves Automated Text Classification (ATC). We use the Extensible Business Reporting Language (XBRL) to define a standard ontology and compare the performance of an ACT engine (IBM Classification Module v.8.6) against 2 other list of concepts, namely simple and hierarchical. Our sample of financial news is extracte...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملThe System of Engagement in a Sample of Prose Fiction and the News
Emerging within Systemic Linguistics, Appraisal/Evaluation is a framework for analyzing the language of evaluation, providing techniques for the systematic analysis of evaluation and stance as they operate in whole texts and in groupings of texts. There are three systems in the Appraisal framework: Attitude, Engagement, and Graduation. This study sets out to analyze the use of the system of Eng...
متن کاملModular Preference Moore Machines in News Mining Agents
This paper focuses on Hybrid Symbolic Neural Architectures that support the task of classifying textual information in learning agents. We give an outline of these symbolic and neural preference Moore machines. Furthermore, we demonstrate how they can be used in the context of information mining and news classification. Using the Reuters newswire text data, we demonstrate how hybrid symbolic an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002